DOCDHEMT BESOHE 



ED 135 799 



T!5 005 666 



AQIHOB 
TITLE 

INSIITDTION 
POB DATI 
NOTE 

AVAILABLE FEOM 



EBBS PEICE 
DESCBIPTOBS 

IDENTIFIEES 



Brovn, Thomas A. 

Admissible Scoring Systems for Continuous 
Distributions, 

Band Corp., Santa Honica, Calif. 

Aug 74 

27p. 

The Band Corporation, 1700 Maii-n Street, Santa iionica, 
California 90406 (P-5235, $1.50) 

MF-$0.83 HC-$2.06 Plus Postage. 

*f!lathematical Models; *Pr€diction; ^Probability ; 
♦Scoring; ^Statistics 
♦Admissible Probability Testing 



ABSTBACT 

The defining property of an admissible scoring system 
is that any individual perceives himself as maximizing his expected 
score by reporting his true subjective distribution. The use of 
admissible scoring systems as a measure of probabilistic forecasts is 
becoming increasingly well-known in those -cases where the forecast is 
a discrete distribution over a finite number of alternatives. Host 
serious forecasts which are made in the real world seem to be 
forecasts of quantities, rather than choices between a finite number 
of alternatives. In such cases as this^r it seems much more natural to 
ask the forecaster to specify a continuous probability distribution 
which represents his expectations rather than trying to re-cast a 
basically continuous process into a discrete one. To construct an 
admissible scoring system for a continuous distribution, a collection 
of possibltj bets can be postulated on a continuous variable, and an 
admissible scoring system can be constructed as the net pay-off to a 
forecaster who takes all bets (and only those bets), which appear 
favorable on the basis of his reported distribution. Mathematical 
models for this and alternative systems are presented. ^(Author/BW) 



* Documents acquired by J!HIC include many informal unpublished * 

* materials not available froji other sources. EBIC makes every effort * 

* to obtain the best copy available. Nevertheless, items of marginal * 
♦reproducibility are often encountered and^this affects the quality * 

* of the microfiche and hardcopy reproductions EBIC makes available * 

* via the EBIC Bocument Beproduction Service (EDBS) . EBBS is not .* 
♦responsible for the quality of the original document. Eeproductions * 

* supplied by EDBS are the best that can be made from the original. * 



EKLC 




ADMISSIBLE SCORING SYSTEMS FOR 
CONTINUOUS DISTRIBUTIONS 



Thomas A. Brown 



August 1974 



CO 

o 



u s. OEPABTMENTOF HEALTH, 
EDUCATION A WELFARE 
NATIONAL INSTlTUTEOF 
EDUCATION 

'HIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORIGIN- 
ATING IT POINTSOF VIEWOR OPINIONS 
STATED DO NOT NECESSARILY REPRE- 
SENTOFFICIAL NATIONAL INSTITUTEQF 
EDUCATION POSITION OR POLICY 



P-5235 



2 



The Rand Paper Series 

Papers are issued by The Rand Corporation as a service to its professional stafff' 
Their purpose is to facilitate the exchange of idearamong those who share the 
author's research interests; Papers are not reports prepared in fulfillment of 
Rand's contracts or grants. Views expressed in a Paper are the author's own, and 
are not necessanly shared by Rand or its research sponsors. 

The Rand Corporation 

Santa Monica, California 90406 



3 



ADMISSIBLE SCORING SYSTEMS FOR 



CONTINUOUS DISTRIBUTIONS 



Thomas A. Brown 



I^ INTRODUCTION 



The use of admissible scoring systems as a measure of 
probabilistic forecasts is becoming increasingly well- 
known in those cases where the forecast is a discrete 
distribution over a finite number of alternatives (e.g.: 
Will it rain or not? Will Dewey, Truman , Wallace , or 
Thurmond.be elected? Will the Rams or the Vikings win the 
game?) . The defining property of an admissible scoring 
system is that any^.^individual perceives-^himself as maxi-- 
mizing his expected score by reporting his true .subjective 
distribution. That is to" s^ ^ IF ySu want to beat the 
system the best way to do it is to be honest. 

Most serious forecasts which are made in the real 
world seem to be forecasts of quantities (e.g.: What will 
be the total U.S. wheat production during 1974? How many 
tanks will there be in the Egyptian Army on July 1, 1975? 
What will be the Dow-Jones average on January 2, 1976?) 
rather than choices between a finite number of alternatives 
In such cases as this, it seems much more natural to ask 
the forecaster to specify a continuous probability distri- 
bution which represents his expectations rather than trying 
to re-cast a basically continuous process into a discrete 
one. But how can we construct an admissible scoring system 
for a continuous distribution? There are three basic 
approaches which seem to work: 

(1) We can regard the continuous distribution as the 
limit of a discrete one, and derive a continuous 



4 



-2- 



admissible scoring system as the limit of a 
sequence or discrete ones. 

(2) We can create continuous admissible scoring 
systems hy exploiting the Schwartz inequality, 
or by using other well-known inequalities of 
mathematical analysis . 

(3) We can postulate a collection of possible bets 
on a continuous variable, and construct an 
admissible scoring system as the net pay-off to 

• a forecaster who takes all bets {and only those 
bets) which appear favorable on the basis of 
his reported distribution. This is ah exact 
analogue to the "gambling house" construction 
% method which may be used to discover discrete 

admissible scoring systems , 

Of the three techniques , I tend to prefer the third 

because it gives greater insight into what actually lies 
behind an admissible scoring system, and suggests ways to 
tailor the scoring system to accomplish one's goals more 
effectively in a given situation. But let us discuss each 

• of the methods in turn. 



5 



II, LIMITS OF DISCRETE SCORING SYSTEMS 



Suppose that the domain of possible answers to a 
forecasting question is^ an interval D. For example, if we 
are forecasting the temperature at noon on May 1 in Santa 
Monica we might take D to be all temperatures between 32^ 
and 130°. Suppose a forecaster specifies a probability 
density function f(x)dx over D which he asserts is his 
best subjective estimate of what the temperature will be. 
How should we reward him when the true temperature becomes 
known? 

We could convert the problem into a discrete one by 

the following device: divide D into n small intervals, 

each of length Ax. Choose a set of n points {x. } in such 

th 

a way that is in the i""" interval. If r (x) is continu- 
ous and Ax -is- -small enough r- then the forecaster is aTssert- 
ing (approximately) that, there is a probability r (x . ) Ax that 
•the true temperature is in the i^^ interval. This is a 
probabilistic forecast over a finite number (n) of alterna- 
tives, so we could use any discrete admissible scoring 
system on it. For the sake of def initeness., let us apply 
the quadratic admissible scoring system. This means that 
if the true answer is in the i^^ interval, the forecaster 
Would be rewarded as follows: 

n 

f^(i) = 2r(x^)Ax - y^[r(x.)Ax]^ 

j = l 

Note that the pay-off becomes small if n is large 
(since Ax is then small) . Recall that if you multiply an 
admissible scoring system by a constant, you get another 
admissible scoring system. Therefore we have another 
admissible scoring system if we "renormalize" the one 
above by dividing out Ax : 

6 . 



n 



f*(i) = 2r(x^) - ^tr(Xj)].^Ax 

If r(x) is continuous, then if we let n-^« the sequence of 
scores f*(i) clearly goes to a limxt F(t), where 



f(t) = 2r(t) - \ [r(x)]^dx 



In this expression, t stands for the "true answer." 
Because of the way in which it was constructed, we. would 
expect the expression above to be an admissible scoring 
system on continuous distributions: any deviation from 
the true subjective r(x) would be reflected in a less than 
optimal '^core on the f^ for n sufficiently large, and 
therefore (one would think) in a less than optimal score 
on f. But to make this argument rigorous is somewhat 
cumbersome. A much more efficient way to provide a rigor- 
ous proof that f(t) is admissible is to invoke the Schwartz 
inequality, which we shall do in the next section. 

Other discrete admissible scoring systems may. be 
introduced as the quadratic was above, although the details 
of the renormalization process differ from case to case. 
If you use the logarithmic discrete scoring system the 
continuous system derived. is simply 

g(t) ' = log^ r (t) 

If you use the "spherical" scoring system o£ Masanao 
Toda, then the corresponding continuous system is 

•r(t) 



h(t) = 



VjrTxpdx 



This limiting process is a good way to discover con- 
tinuous scoring systems, but it is a poor way to prove 
> 'Ctha^t "a scoring system has the "admissible property." 



7 



-5- 



III, EXPLOITING SOME WELL-KNOWN INEQUALITIES 



Once a continuous admissible scoring system has been 
discovered (by meaiis of a limiting process applied to a 
discrete admissible scoring system^ or otherwise) , it is 
usually not very difficult to provide a rigorous proof 
that the scoring system is, in fact, admissible by apply- 
ing more or less well-known theorems and techniques from 
the field of integral inequalities. By way of illustra- 
tion, let us begin by attacking the quadratic scoring 
system f(t), mentioned above. if we let s(t)dt denote the 
"true" probability density function, and r(t)dt denote the 
probability density function specified by the respondent, * 
then wt. must prove that the respondent will maximize his 
expected score by making r(t) = s(t). Putting this into 
symbols, we must prove that 

|2s(t)r(t)dt - Js(t)dtjr(x)2dx ^ J2s(t)2dt - Js (t ) dtjs (x) -^dx 
Since s(t)dt is a probability density function, we 

have 

Js(t)dt = 1 
Therefore we seek to show 



|2s(t)r(t)dt - Jr(x)^dx :^ Js (x) ^dx 
which is the same as showing 



0 ^ J[s(t) ^ r(t)]^dt 



Since the above inequality obviously holds, and is a 
strict inequality unless s(t) ^ r(t), except on a set of 
measure zero, it follows that the '^quadratic" continuous 
scoring system does indeed have the admissible property. 



8 



Now let us turn our attention to the "logarithmic" 
continuous scoring system, g(t). As before, let s(t)dt 
denote the "true" probability density function, and let 
r(t)dt denote the one specified by the respondent. Our 
task is to prove that 

Js(t) log s(t).dt ^ Js(t) log r(t)dt 



and that equality holds if and only if s(t) = r(t). But 
this is the same as proving- 



0 ^ 



js(t) log [Hlfjdt 



We know that 

log X ^ X - 1 

with equality holding if and only if x = 1 (indeed, the 

right-^hand side is a tangent line to the left-hand side of 
the above inequality) . Therefore 

js(t) log [^]dt . js(t)}[f[|f] - ijdt = Jr(t)dt - Js(t)dt 

= 1 - 1= d 

The inequality is strict unless r(t)/s'(t) = 1 almost every- 
where, and thus, our desired result is established. Note, 
by the way, that we made use of the fact that r(t) integrates 
to one over the whole space, while we did not have to use 
this fact in establishing that the quadratic continuous 
scoring system is admissible. Indeed, the logarithmic 
scoring system as defined here can be "beaten" if you allow 
the respondent to specify improper distributions (ones 
which integrate to more than one), while this is not the 
case with the quadratic scoring system. As a practical 
matter, this means that any real-world implementation of 



9 



-7- 



the logarithmic scoring system must include a test (perhaps 
followed by a renormali zation) to ensure that he is speci-^ 
fying a proper distribution. This test is unnecessary in 
the case of the quadratic scoring system for the respondent 
is only "hurting himself" if he specifies an improper distri 
bution . 

Finally, let us look at the "spherical" continuous 
scoring system. The inequality which we must establish in 
order to prove that this is an admissible scoring system 
is the following: 

Js (t)r (t)dt fs (t)^dt 

_ — ^ _ _ 



Vjr (x)2dx V/s(x)^dx 

This is readily transformed into 

Js(t)r(t)dt ^ ^Js(t)^dtjr(t)2dt 

which is the very well-known Schwartz inequality, with 
equality holding if and only if s(t) = r(t) almost every- 
where. 

All three of the scoring systems which we have so far 
discussed have a common handicap in that they do not seem 
to take adequate account of the topology of the real line. 
This is to say, if forecaster A asserted that the true 
answer was certain to appear between 10 and 11, and the 
true answer was 12, common sense indicates that B had done 
a better job than A: he had put his distribution closer to 
the answer than A had. But none of the schemes we have 
discussed would give him any credit for that. If A and B 
both used rectangular distributions , they would both get 
identical scores, as follows: 



10 



-8- 



Scoring System 

Quadratic 

Logarithmic 

Spherical 



A's Score 



-1 



B's Score 



-1 



We will now turn to a construction technique which 
can readily produce admissible scoring systems on continu 
ous distributions which do not have this handicap • 



11 



-9- 



IV. A SEQUENCE OF BETS CONSTRUCTION METHOD 



A decisionmaker is interested in forecasts which help 
^ii^.it^ks more intelligent bets about the future. If we 
take this aphorism seriously, it suggests that we construct 
a scorin.g' system for probabilistic forecasters on the basis 
of how well a gambler would do who made all wagers (and 
only those wagers) which offered a positive expected pay- 
off according to the distribution specified by ^ the fore- 
caster. This approach has been very successful as a tech- 
nique for deriving the best-known discrete admissible 
scoring systems and gaining new insights into their proper" 
ties. Let us try to apply it to derive continuous admissible 
scoring systems. 

First of all, what is a typical bet in a continuous 
context? Let y be a real random variable, let x be some 
fixed real number, and let r be a fixed real number between 
zero and one. A typical bet would be for me to agree to 
pay you an amount r if y turns out -to be greater than x, 
on condition that you pay me an amount 1 - r if y turns 
out to be less than or equal to x. This bet will look 
favorable to me if i believe the probability that y will 
be less than or equal to x is greater than r. My pay-off 
may be written symbolically as follows: 

-e^(r, y) = U - r if y ^ X 
' -r if y > X 

Similarly we could make bets where I receive the posi- 
tive- pay-off if y > X. Symbolically, such a bet could be ■ 
written 



u^(r, y) = j if y ^ X 

' 1 - r if y > X 



12 



-10- 



This bet will look favorable to me if I believe the. 
probability that y :^ ' x is less than 1 - 4 ; otherwise, it 
will look unfavorable to me. In order to avoid minor tech- 
nical problems witl:^ infinity, let us suppose ' ^^nh i' is 
certain (and known to everyone) that -L ere L is 

some fixed real number) • Let us suppose « fc each x, 

we have a continuous spectrum of infinitesimal bets (pro- 
portional to the bets above) with r ranging from zero to 
one. Let.R(x) denote our "subjective probability" that 
y ^ X. Then we would take up all the "lower" bets (of the 
type denoted by the function above) for which r < R(x) , 
and we would take up all the "upper" bets (of the type 

denoted by the function u above) for which r < 1 - R(x) • 

X • 

If t denotes the "true value" which the random variable y 
assumes, then our net pay-off for lower and upper bets 
taken at this value of x will be 



rR(x) /-l-R(x) 
I X (r/ t)dr and I u (r , t)dr 

Jo ^ Jo ^ 



respectively. We can calculate these integrals explicitly 
as follows: 

R(x) - 5^4^ if t ^ X 



s 



rR(x) I ' ' 2 

\ - i (r, t)dr = { 2 
Jo )-R(x)^ 



(l-R(x))^ 



if t > X 



if t s; X 



l-R(x) J 2" 

u (r, t)dr =-• < 

0 ^ ■ )l - R(x) - if t > 



2 



13 



These net pay-offs themselves constitute bets\ If we 
imagine these bets as being distributed continuously over 
the interval -L < x < L, then we see that our grand total 
pay-off function will be as follows: 



J+L fR(x) f+L /•l-R(x) 

dx A^ir, t)dr J ^ j t)dr 

It should be very clear that this pay-off scheme con- 
stitutes an admissible scoring system, because we would 
perceive any deviation between the R(x) we reported and the 
R(x) we actually believed as being equivalent either to 
rejecting some bets which were favorable to us or accept- 
ing some which were unfavorable. 

Let's take a closer look at the scoring system we have 
just derived. Carrying out some obvious transformations 
shows that it can be expressed in the following form: 

^(t) = 1 ^ [1 - R(x)]dx R(x)dx - h [R^(x) 

+ (l - R(x)) ^]dx 

Suppose a respondent does not make use of the freedom 
he has to specify a distribution, but simply makes a "point 
estimate ''d. That is to say, he reports the following 
cumulative • function : 



^_ ; 0 If X < d 
R(x) = 

1 if X ^ d 



Then a simple calculation shows 
F{t) = L = It - d| 

14 



-12- 



This is quite a satisfying result: the penalty he 
suffers is exactly the amount by which he missed the true 
answer. What could be more natural? 

What score does an individual expect to make if he 
believes in and reports a cumulative distribution R(x)? 
This is easily calculated as follows: 



Respondent's f+L 

Expected =\ RMt)F{t)dt 



Score 



+L 

(t)dt 



R(x)dx - h^^^ R^{x) + (l - R(x))^dx 
= ^^^j"^ R'(t){l - r(x))dtdx 



(t)R(x)dtdx 



S+L ^ \ 2 

R^^ (x) + (l - R(x)) dx 



= h^'^^ R^(^) + (l - R(x))^dx 



Note the interchange of order of integration in the 
calculation above. It is easy to see that a respondent's 
maximum expectation occurs when he is completely certain 
of the right answers (he then expect.s>....to score L) and the 
minimum expectation occurs when he feels there is a 50% 



15 



-13- 



chance y = -L and a 50% chance that y = +L (in which case 
his expectation is L/4) .. If he feels that y is equally 
likely to assume any value between -L and +L, then his . 
expectation is 2L/3, The reader may feel that the expected 
pay-off is not sufficiently sensitive to the precision of 
the respondent's estimate; but recall that any admissible 
scoring system may be multiplied by a positive constant or 
have any constant added to it. Thus we could "renormalize" 
to secure any degree of '■ 'tiv ty desired. 

i'p verify directly ,iat F(l> is an admissible scoring 
system, let us introduce S (x) to stand for the true cumula- 
tive function of the random variable y. Then the absolute 
expected pay-off to a response R (x) is as follows: 

Absolute 

Expected =1 S'(t)F(t)dt 

Score J-L 



r+L t 
= j ^ S' (t)d:t I (l - R(x)) dx 

+ \ S'(t)dt-I| R(x)dx - h \ R^(x) + (l - RtifM-^dx 



+L , 

(l - S(x)) (l - R(x) ) + S(x)R(x) 



-L 



- h(R^{x) ) - h:^ - R(x) )2 j dx 
= h^_^ js^Cx) ^ (1 - S(x))2jdx - j"*"!^ (S(x) - R(x)|2dx 

The first integral j3i the right-hand side above depends 
only.,, on the trUe distribution S (x) ; the second integral is 



16 



-14- 



obviously minimized if and pnly if the respondent's distri- 
bution equals the true distribution almost everywhere. 
This confirms that F(t) is, indeed, an admissible scoring 
system. 

The first integral on the right-hand side is of 

interest in itself, for it gives a general expression for 

the maximum score which an individual can expect against a 

given distribution. It can be interpreted as a constant 

minus the integral of the variance of the two-alternative 

distribution "greater than x or le^s than x" across all x. 

The reader may suspect that there is some deep relationship 

between. this quantity and the variance of the distribution 

represented by S(t) itself. Note, however, that if we multi 

ply the random variable y by a positive constant K, then 

this quantity is multiplied by K v^hile the variance of y is 

2 

multiplied by K . 



11 



-15- 



V. GENERALIZATIONS 



It is quite clear from the nature of the "sequence, of 
bets" construction method that we can generate other 
admissible scoring systems by varying the functions A (r, t) 



u^(r, t) in the expression 



r+L /-R(x) /.+L Yl-R(x) 

In fact, if cp(x, r) and Y (x, r) are any positive 
functions of x and r whatsoever, then we generate an admis 
sible scoring- system on continuous distributions by taking 



^^i^, y) = 



(1 - r) cp (x, r) if y s: x 
-r cp(x, r) if y > X 



u l,r, y). = ^) if y-^ X 

^ (1 - 4)f (x, r) if y > X 



in the expression above. Whatever functions we use , how- 
ever, we will always come out with a scoring system in 
terms of the cumulative probability function rather than 
the probability density function.' This is because of the 
form of the bets we are permitting: they are all bets 
that y will fall in a given half-time (essentially) . One 
could think of admissible scoring systems based on the 
probability density function (like the three discussed in 
„ Sections .,1 1., .and ..III aboVe )" " as ' be ing' ' generated ' 
>.of bets placed on whether or not y falls in a sequence of 
smaller and smaller intervals. Which class of scoring 
systems is more, appropriate depends on the details of. the 
particular application you have in mind. 



-16- 



VI. ASSESSING REALISN 



Repeated experiments 'have indicated that it is a 
common human characteristic to overstate high probabili-- 
ties and understate low ones, Or^ put-ting it another way, 
to overestimate the degree of one's knowledge. Or /putting 
it still another way/ to report subjective probability 
'distributions which are too "tight." Some individuals^^ 
exhibit the opposite form of behavior, however , and tend 
to hedge too much rather than too little. In any informa- . 
tion system involving subTjective probability estimates we 
v/ould like to detect when either of these behavior pat±rerns ^ 
are preseint and provide :appropriate feedback in order to 
help the est^^^ improve their behavior. 

The way this has been done in Rand CA:apT implementa- 
tions in the past is tha± the individual's external validity 
graph has been estimated (generally using a one-parameter 
linear least sq.uares technique) and feedback has been based 
on this estimate. It is; hard to see what would be the 
continuous analogue of an external validity :graph/ however:,. 

Therefore I believe-that we should .provide fgeedback to 

respondents (urging them to hedge moxe or hedge less) 
strictly on the basis of whether tiiey score worse or betteir 
(over a number of questions) than they expected . 

Let me give a heuristic justification for this 
approach. If an individual gets a significantly worse 
score than he expected, what does that indicate? It indi- 
cates that events he thought were relatively likely did mot 

,,QP.pi4,i:, ,ai5,., freguently /as,^^^ .expecte^^ . whi.l.e. ..ev!H!Ea±:s 
were relatively uriiaikely occurred more frequently than he 
expected. He woulSl bave done better if he had. hedged his 
bets more. On th:e other hand, if his score is better than 
he expected; that .^dicates just the reverse., and he wbuldi 
have made an even .■iie±ter score if he had not hedged his 

,^.,bets„'S.o.,much,. ^. ■.; 



-17- 



Of course, an individual will rarely make exactly his 
expected score. In order to determine whether the differ- 
ence between his expected score and his achieved score is 
great enough to justify corrective feedback, it seems 
reasonable to calculate the variance of his expected score 
and provide feedback only if his actual ;score falls more 
than two standard deviations (say) away from his expected 
score . 

T£> make these ideas Seiinite, let us work out specific 
formulas for the case of the quadratic, scoring system 
(f(t) derived in Section XI ). Suppose ran individual answers 
a se±. of n questions, giATtng a response r^ (x) on ±he i^^ 
question. His expected .score (m^) onr±aae i^'^ ques±ion is 
then given by 

= j[r^(x)l2dx. 



An easy calculat.ion shows that tke variance on the 
2 ' 
question (a^ ) is gi~ven.by 

2] 



Xf the questions are independent of lone another, then 
the variance of the total .score will be simply the sum of 
the variances of the scorass on the individual questions. 
In actual practice, the questions will probably not be 
completely independent of one. another, and this will mean 
that the Trariance in the total score will be somewhat 
greater. '.This does not undermine our basic concept of 

th.e, .x^.jE;.i..^j3i.qe:.,,9;a lculafcejdx:as though the questions are 
independenit as a standard for providing feedback:^ it only 
means we -nmst not .try' to. be tmo precise in making statements 
about whaitr'the individual vifewed his chance of making such 
a large or such a small scocs as being. The expected score 
on the whole test will be the^ sum of the e3K>.ected .scores 



-18^ 



on the individual questions, whether the questions are 
independent or not. So let us take 



n 



m 



i=l 



n 

2 2 



i=l 

n n 

i=l i-1 

Wesi:ruight then provide feedback in the following form: 

If: T < m - 4a: "Your score is significantly worse 
than you expected. You would have done much better 
if you had put more spread into your responses and 
not claimed to be so certain about the correct values." 

If m - 4a ^ T < m - 2a: . "Your score is sromewhat 
worse than you. expected . You would have cdone better: 
if you had hedged more in your responses." 

If m - 2a ^ T ^ m + 2a: No corrective feedbacks 

JIf m + 2a < T ;^ m + 4a: "Your score is somewhat 
■better than you expected . If would be even better 
3::f y.ou had not: hedged your responses quite so much." 

.If xn -f .4a < T: "Your score -is signifiLcantly better 
than you expected. You would have --mafe. an eve:n better: 
scQ3:ie if ycaai^ had had more conf idenee raaid had not out 
so:2inuch spread into your responses* 



-19- 



VII . experimenta:' i jntation 



We seem to have some sort of answer to every question 
I can think of about CAAPT for continuous distributions. 
Therefore we should proceed at once to some sort of demon- 
stration prog-ram so that we can garner some informal., 
empirical experience before investing in a program designed 
for serious use and systematic test. JOSS seems like a 
good vehicle for such a demonstration program for the 
following reasons: 

1. It's the only system in which I am profivcient. 

2. It lends itself to outside demonstrations. 

3. Its limitation to alpha-numeric 10 is a limita- 
tion shared by many systems which might be good 
vehicles for follow-on programs (e.g., Jerry 
Shure ' s system) . 

There are some serious difficulties with the JOSS sys- 
tem, however; 

1. Its 10 speed is very slow for this application. 

2. Lack of graphic output makes some natural feed- 
back schemes infeasible. 

Even if the above difficulties mean that, the .JTCISS 
program is ineffective, however, the wcECk we put into it 
will not be entirely wasted: much of: it he work required in 
preparing the JOSS program will be required for any contin- 
uous CAAPT routine . So let us charge ahead with theout- 
line of a plan for such an experimental JOSS implemeHttation . 

The scoring system used will be the quadratic scoring 
system introduced .in Section II above. . Feedback (a± ;ae 
end of a set of cpaestions) will be based on the inaivlduaii: ■ s 



-20- 



■expected score and variance (as discussed in Section VI 
above). The respondent will have two options: 

r. Feedback after every question and at the end of 

the question set. 
2. Feedback at the end of test only. 

■ th 
Questions will be in sets of up to nine. The i 

question in a will be "part 90 + i." The true answer 

to the i^^ question will be T(i); the number of questions 

in the set will be T(o); the normalization number (to be 

multiplied by the raw score on the question) will be U(i). 

The responses v/ill be. demanded in the following format: 

There is a . 01 chance the true answer is less than : 

There is a .20 chance the true answer is less than: 

There is a >.40 chance the true answer is less than: 

There is a .60. chance the true answer is less than: 

There is a .80 chance the true answer is less than: 

There is a .99 chance the true answer is less than: 

The above format will be used on the first question in 
a set only. Thereafter the following format will be used: 

. . . , 01 . . . less than : 
... .20... less than: 
... . 40 . . .J-e:Ss than : 
... . 60 . . .iess than : 
... . 80 . . .less than : 
... . 99 . . .-less than : 

. th 

The six percentile breaks elicited on the i question 
would be stored ^ Ml, 1), R(i. 2)...R(i, 6). In calcu- 
lating the rewara fxmction, we hypothesize that the proba- 
bility density &i3inc±ion is constant between the specified 

2^ 



-21- 



percentile breaks. This hypothesis enables us to make 
easy computations and provide feed-back well suited to 
alpha -numeric output. Specifically, if we ignore the 
density less than .01 and greater than .99, we can cal- 
culate the integral in the scoring function as follows; 

0 = '0361 .0361 ^ 

R(i,2) - R(i,l)' R(i,6) - R(i,5) 



.04 

2J R(i,j+1) - R(i,j) 

We will also temporarily store . the probability 
densities as follows: 



D(l) = 



D(2) = 

D(3) = 

D(4) = 

D(5) = 



R(i,2) 


- RU,1) 




.20 


R(i,3) 


- R{i,2y 




.20 


R(i,4) 


- R(i,3) 




.20 


R(i,5) 


- R(i,4) 




.19 


R(i,6) 


- R(i,5y 



After all six of the R's have been elicited from a 
respondent, then we will provide feed-back of the follow- 
ing form: 



24 



-22- 



IF THE TRUE ANSWER 


Id : 










WTT.T. 




less than R(i,l) 










-U(i) 


•Q 




greater than or equal 


to 


R(i 


,1) 


but 


U(i) 


• [D(l) 


- Q] 


less than R(i,2) 
















greater than or equal 


to 


R(i 


,2) 


but 


u(i) 


• [D(2) 


- Q] 


less than R(i,3) 
















greater than or equal 


to 


R(i 


,3) 


but 


U(i) 


• [D(3) 


- Q] 


less than R(i,4) 
















greater than or equal 


to 


R(i 


,4) 


but 


U(i) 


• [D(4) 


- Q] 


less than R(i,5) 
















greater than or equal 


to 


R(i 


,5) 


but 


U(i) 


• [D(5) 


- Q] 


less than R(i, 6) 
















greater than or equal 


to 


R(i 


,6) 




-U(i) 


•Q 





The underlined quantities above will/ of course/ 
be expressed in numerical terms. The above format will 
be one of two alternative feed-back formats. The more 
condensed version, used after the first question is over 
with, will be the following: 

IF THE TRUE ANSWER IS ; YOU WILL SCORE 

less than R ( i , 1 ) 
between 



R(i.2) 
between / . o\ 

between 

R(i.4) 

between 

R(i.5) 

between 

R(i.6) 

greater than or equal to 



-U(i) 


•Q 




U(i) 


• [D(l) 


- Q] 


U(i) 


• [D(2) 


- Q] 


U(i) 


• [D(3) 


- Q] 


U(i) 


• [D(4) 


- Q] 


U(i) 


• [D(5) 


- Q] 


-U(i) 


•Q 





-23- 



The respondent will then be given a_n opportunity to 
revise his answer (if he wishes). If he is satisfied that 
his response is the best he can make, he will signal that 
fact to the machine. The machine will then report the true 
answer to the individual along with the score he achieved. 

order to form the basis for analytical feed-back 
at the end of the question set, we will keep a tally of 
true total score, expected total score, and expected 
variance in total score as quantities L, M, and N 
respectively. Therefore before we proceed to the next 
question we must carry out the following operations: 



S(i) = Tru 



e score on question i 



E(i) - Expected score on question i 



.19.U(i) . ID(1) + D(5)J + .2-U(i). X/ D(j) 

j=2 

-U(i) -Q 



Set L = L + S (i) 



Set M = M + E (i) 



Set N = N + .02- [U (i) -0]^ + .19 U£i.) ^ • I (D (1) - Q) ^ 



+ (D(5) - Q)^J 



2 ^ -.2 _,..2 



+ .2-U(i)' ^ [D(j) - q:]- - E(i) 
j=2 



After all questions in the set have been answered 
the machine will provide one of the five feed-back messages 



26 



ERIC 



-24- 



specified in Section VI above, depending on whether 
L < M - 4Vn, M-4yN<L<M - 2/N, M + 2Vn < L < 
M + Vn, or M + 4yN < L 



27 



9 



ERIC 



iiiiilliiiJiii 



